Estrella Hurtado Alina Valliani
April 25, 2024
Our data is derived from Instagram accounts and comes from the website known as Kaggle.com.
Our data contains usernames, followings, followers, likes, comments, and locations of different accounts.
We added some columns in our data such as engagement, engagement_quantile, post_timestamp, and caption_length.
This data is interesting because it has a large sample of different accounts where we can draw conclusions about patterns in engagement scores. We also compare and contrast some things from our data.
We choose this data to understand Instagram engagement trends and the factors which contributes to post and videos.
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'plotly' was built under R version 4.3.3
library(tidyverse) Use to filter the data.
library(lubridate) Use to format dates.
library(stringr) Use to provides a set of functions designed to make work with strings.
library(dplyr) Use to provide a function for each basic verb of data manipulation.
library(plotly) Use to make interactive graphs.
## Rows: 11,692
## Columns: 14
## $ owner_id <chr> "36063641", "36063641", "36063641", "36063641", "36063…
## $ owner_username <chr> "christendominique", "christendominique", "christendom…
## $ shortcode <chr> "C3_GS1ASeWI", "C38ivgNS3IX", "C35-Dd9SO1b", "C33TadDM…
## $ is_video <lgl> FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
## $ caption <chr> "I’m a brunch & Iced Coffee girlie☕️🍳 \n\nTop @ta3 X …
## $ comments <dbl> 268, 138, 1089, 271, 145, 143, 356, 132, 128, 884, 211…
## $ likes <dbl> 16382, 9267, 10100, 6943, 17158, 9683, 42906, 4287, 74…
## $ created_at <dbl> 1709326758, 1709241048, 1709154707, 1709065322, 170871…
## $ location <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ imageUrl <chr> "https://instagram.flba2-1.fna.fbcdn.net/v/t39.30808-6…
## $ multiple_images <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
## $ username <chr> "christendominique", "christendominique", "christendom…
## $ followers <dbl> 2144626, 2144626, 2144626, 2144626, 2144626, 2144626, …
## $ following <dbl> 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, …
Engagement - refers to the actual score from the data.
engagement_quantile - refers to the follower count divided into four different quarters.
post_timestamp - refers to the time when pictures or videos was posted.
caption_length - refers to the length of the caption.
(Added new columns which represent 1 as the lowest followers, 2 and 3 as the average followers and the 4 as the highest followers).
new_data<- insta_data %>% mutate(engagement = round((((likes+comments)/followers)*100),digits = 2),
follower_quantile = ntile(followers,4),
engagement_quantile = ntile(engagement,4),
post_timestamp = as_datetime(created_at),
post_time = format(round(post_timestamp,units = "hours"),format = "%H:%M"),
caption_length = lengths(strsplit(caption, ' ')))Our original data was messed up so we added new columns with calculated values.
## Rows: 11,692
## Columns: 20
## $ owner_id <chr> "36063641", "36063641", "36063641", "36063641", "3…
## $ owner_username <chr> "christendominique", "christendominique", "christe…
## $ shortcode <chr> "C3_GS1ASeWI", "C38ivgNS3IX", "C35-Dd9SO1b", "C33T…
## $ is_video <lgl> FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
## $ caption <chr> "I’m a brunch & Iced Coffee girlie☕️🍳 \n\nTop @ta…
## $ comments <dbl> 268, 138, 1089, 271, 145, 143, 356, 132, 128, 884,…
## $ likes <dbl> 16382, 9267, 10100, 6943, 17158, 9683, 42906, 4287…
## $ created_at <dbl> 1709326758, 1709241048, 1709154707, 1709065322, 17…
## $ location <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ imageUrl <chr> "https://instagram.flba2-1.fna.fbcdn.net/v/t39.308…
## $ multiple_images <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ username <chr> "christendominique", "christendominique", "christe…
## $ followers <dbl> 2144626, 2144626, 2144626, 2144626, 2144626, 21446…
## $ following <dbl> 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, 10…
## $ engagement <dbl> 0.78, 0.44, 0.52, 0.34, 0.81, 0.46, 2.02, 0.21, 0.…
## $ follower_quantile <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1,…
## $ engagement_quantile <int> 3, 2, 3, 2, 3, 2, 4, 2, 2, 4, 2, 2, 1, 3, 4, 2, 3,…
## $ post_timestamp <dttm> 2024-03-01 20:59:18, 2024-02-29 21:10:48, 2024-02…
## $ post_time <chr> "21:00", "21:00", "21:00", "20:00", "20:00", "20:0…
## $ caption_length <int> 12, 34, 81, 57, 17, 66, 50, 17, 8, 53, 17, 20, 90,…
is_video - refers to the videos posted on Instagram account.
caption - refers to the titles on the Instagram posts.
comments/likes - refers to the followers response to the posts.
created_at - refers to the coded time stamp of when the post was created.
multiple_images - refers to the boolean of whether the post was a carousel or multiple image upload.
followers/following - refers to the users.
Insights on the average follower distribution meaning 1 is the lowest, 4 is the highest.
## # A tibble: 5 × 2
## follower_quantile follower_mean
## <int> <chr>
## 1 1 108,262
## 2 2 342,149
## 3 3 834,535
## 4 4 8,559,178
## 5 NA NA
Low amount of followers have the highest engagement whereas high amount of followers have the lowest engagement.
This is showing average engagement percent by post local time.
## # A tibble: 24 × 3
## post_time `mean(engagement)` `n()`
## <chr> <dbl> <int>
## 1 00:00 2.08 261
## 2 01:00 1.88 267
## 3 02:00 1.71 238
## 4 03:00 2.99 178
## 5 04:00 2.37 138
## 6 05:00 3.34 125
## 7 06:00 2.38 145
## 8 07:00 1.59 185
## 9 08:00 3.81 223
## 10 09:00 1.93 293
## # ℹ 14 more rows
We see the most engagement between the hours of 5am, 8am, 12pm, 1pm, 4pm and 5pm, during peak times of the day.
Highest engagement posts include captions with lengths x & y.
The graph shows that the short captions gains more engagement.
The graph clearly show that the single images get more engagement whereas, carousels get less.
We see pictures get more comments and likes than videos.
The following examples, give us glimpse of the one the videos with the highes engagement as well as an image post.
Here we notice this account @therawtextures has 263,044 followers, at the time of the data extraction and was able to receive 909,788 like and 2,683 comments with a 346% engagement score!
Below is an example of a single image that gained @maaren_xx creator a 105% engagement score with 3,222 likes and 45 comments while having 3,114 followers.
In summary, our presentation emphasized that our Instragram accounts data have following:
Highest engagement posts have lowest followers.
Highest engagement posts happen between early morning, noon and afternoon hours.
Highest engagement posts have short captions.
Single images get more engagement than carousels and videos.
In general, Pictures get more engagement than videos.